AITopics | scan statistic

Collaborating Authors

scan statistic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Near-optimal Anomaly Detection in Graphs using Lovász Extended Scan Statistic

Neural Information Processing SystemsMar-13-2024, 18:07:30 GMT

The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lovász extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, k-nearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.

graph, statistic, statistics, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry:

Health & Medicine > Epidemiology (0.54)
Energy > Power Industry (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Detecting Significant Multidimensional Spatial Clusters

Neural Information Processing SystemsApr-6-2023, 15:47:30 GMT

Assume a uniform, multidimensional grid of bivariate data, where each cell of the grid has a count ci and a baseline bi. Our goal is to find spatial regions (d-dimensional rectangles) where the ci are significantly higher than expected given bi. We focus on two applications: detection of clusters of disease cases from epidemiological data (emergency depart- ment visits, over-the-counter drug sales), and discovery of regions of in- creased brain activity corresponding to given cognitive tasks (from fMRI data). Each of these problems can be solved using a spatial scan statistic (Kulldorff, 1997), where we compute the maximum of a likelihood ratio statistic over all spatial regions, and find the significance of this region by randomization. However, computing the scan statistic for all spatial regions is generally computationally infeasible, so we introduce a novel fast spatial scan algorithm, generalizing the 2D scan algorithm of (Neill and Moore, 2004) to arbitrary dimensions.

baseline, dmax, overlap-kd tree, (15 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Epidemiology (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Score-Based Change Detection for Gradient-Based Learning Machines

Liu, Lang, Salmon, Joseph, Harchaoui, Zaid

arXiv.org Machine LearningJun-26-2021

The widespread use of machine learning algorithms calls for automatic change detection algorithms to monitor their behavior over time. As a machine learning algorithm learns from a continuous, possibly evolving, stream of data, it is desirable and often critical to supplement it with a companion change detection algorithm to facilitate its monitoring and control. We present a generic score-based change detection method that can detect a change in any number of components of a machine learning model trained via empirical risk minimization. This proposed statistical hypothesis test can be readily implemented for such models designed within a differentiable programming framework. We establish the consistency of the hypothesis test and show how to calibrate it to achieve a prescribed false alarm rate. We illustrate the versatility of the approach on synthetic and real data.

hypothesis, lin, statistic, (14 more...)

arXiv.org Machine Learning

2106.14122

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

Chitra, Uthsav, Ding, Kimberly, Raphael, Benjamin J.

arXiv.org Machine LearningJul-15-2020

Anomaly estimation, or the problem of finding a subset of a dataset that differs from the rest of the dataset, is a classic problem in machine learning and data mining. In both theoretical work and in applications, the anomaly is assumed to have a specific structure defined by membership in an $\textit{anomaly family}$. For example, in temporal data the anomaly family may be time intervals, while in network data the anomaly family may be connected subgraphs. The most prominent approach for anomaly estimation is to compute the Maximum Likelihood Estimator (MLE) of the anomaly. However, it was recently observed that for some anomaly families, the MLE is an asymptotically $\textit{biased}$ estimator of the anomaly. Here, we demonstrate that the bias of the MLE depends on the size of the anomaly family. We prove that if the number of sets in the anomaly family that contain the anomaly is sub-exponential, then the MLE is asymptotically unbiased. At the same time, we provide empirical evidence that the converse is also true: if the number of such sets is exponential, then the MLE is asymptotically biased. Our analysis unifies a number of earlier results on the bias of the MLE for specific anomaly families, including intervals, submatrices, and connected subgraphs. Next, we derive a new anomaly estimator using a mixture model, and we empirically demonstrate that our estimator is asymptotically unbiased regardless of the size of the anomaly family. We illustrate the benefits of our estimator on both simulated disease outbreak data and a real-world highway traffic dataset.

anomaly, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2007.07878

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

The Kernel Spatial Scan Statistic

Han, Mingxuan, Matheny, Michael, Phillips, Jeff M.

arXiv.org Machine LearningJun-13-2019

Kulldorff's (1997) seminal paper on spatial scan statistics (SSS) has led to many methods considering different regions of interest, different statistical models, and different approximations while also having numerous applications in epidemiology, environmental monitoring, and homeland security. SSS provides a way to rigorously test for the existence of an anomaly and provide statistical guarantees as to how "anomalous" that anomaly is. However, these methods rely on defining specific regions where the spatial information a point contributes is limited to binary 0 or 1, of either inside or outside the region, while in reality anomalies will tend to follow smooth distributions with decaying density further from an epicenter. In this work, we propose a method that addresses this shortcoming through a continuous scan statistic that generalizes SSS by allowing the point contribution to be defined by a kernel. We provide extensive experimental and theoretical results that shows our methods can be computed efficiently while providing high statistical power for detecting anomalous regions.

artificial intelligence, machine learning, spatial reasoning, (18 more...)

arXiv.org Machine Learning

1906.09381

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Utah (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.48)

Add feedback

Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

Maurya, Abhinav, Murray, Kenton, Liu, Yandong, Dyer, Chris, Cohen, William W., Neill, Daniel B.

arXiv.org Machine LearningFeb-13-2016

Early detection and precise characterization of emerging topics in text streams can be highly useful in applications such as timely and targeted public health interventions and discovering evolving regional business trends. Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have numerous shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. In this paper, we describe Semantic Scan (SS) that has been developed specifically to overcome these shortcomings in detecting new spatially compact events in text streams. Semantic Scan integrates novel contrastive topic modeling with online document assignment and principled likelihood ratio-based spatial scanning to identify emerging events with unexpected patterns of keywords hidden in text streams. This enables more timely and accurate detection and characterization of anomalous, spatially localized emerging events. Semantic Scan does not require manual intervention or labeled training data, and is robust to noise in real-world text data since it identifies anomalous text patterns that occur in a cluster of new documents rather than an anomaly in a single new document. We compare Semantic Scan to alternative state-of-the-art methods such as Topics over Time, Online LDA, and Labeled LDA on two real-world tasks: (i) a disease surveillance task monitoring free-text Emergency Department chief complaints in Allegheny County, and (ii) an emerging business trend detection task based on Yelp reviews. On both tasks, we find that Semantic Scan provides significantly better event detection and characterization accuracy than competing approaches, while providing up to an order of magnitude speedup.

data mining, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1602.04393

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Epidemiology (0.88)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
(3 more...)

Add feedback

Mining 911 Calls in New York City: Temporal Patterns, Detection, and Forecasting

Chohlas-Wood, Alex (New York University) | Merali, Aliya (New York University Center) | Reed, Warren (New York University) | Damoulas, Theodoros (New York University)

AAAI ConferencesMar-1-2015

The New York Police Department (NYPD) is tasked with responding to a wide range of incidents that are reported through the city's 911 emergency hotline. Currently, response resources are distributed within police precincts on the basis of high-level summary statistics and expert reasoning. In this paper, we describe our first steps towards a better understanding of 911 call activity: temporal behavioral clustering, predictive models of call activity, and anomalous event detection. In practice, the proposed techniques provide decision makers granular information on resource allocation needs across precincts and are important components of an overall data-driven resource allocation policy.

artificial intelligence, data mining, machine learning, (15 more...)

AAAI Conferences

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > Richmond County > New York City (0.05)
North America > United States > Washington > King County > Bellevue (0.04)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Spatial Scan for Disease Mapping on a Mobile Population

Lan, Liang (Temple University) | Malbasa, Vuk (University of Novi Sad) | Vucetic, Slobodan (Temple University)

AAAI ConferencesJul-14-2014

In disease mapping, the spatial scan statistic is used to detect spatial regions where population is exposed to a significantly higher disease risk than expected. In this important application, the current residence is typically used to define the location of individuals from the population. Considering the mobility of humans at various temporal and spatial scales, using only information about the current residence may be an insufficiently informative proxy because it ignores a multitude of exposures that may occur away from home, or which had occurred at previous residences. In this paper, we propose a spatial scan statistic that is appropriate for disease mapping on mobile populations. We formulate a computationally efficient algorithm that uses the proposed statistic to find significant high-risk regions from mobile population's disease status data. The algorithm is applicable on large populations and over dense spatial grids. The experimental results demonstrate that the proposed algorithm is computationally efficient and outperforms the traditional disease clustering approaches at discovering high-risk regions in mobile populations.

artificial intelligence, data mining, machine learning, (16 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

Sharpnack, James L., Krishnamurthy, Akshay, Singh, Aarti

Neural Information Processing SystemsDec-31-2013

The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov\'asz extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, $k$-nearest neighbor graphs, and $\epsilon$-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry:

Health & Medicine > Epidemiology (0.54)
Energy > Power Industry (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

Sharpnack, James, Krishnamurthy, Akshay, Singh, Aarti

arXiv.org Machine LearningDec-11-2013

The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lovasz extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, k-nearest neighbor graphs, and epsilon-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1312.3291

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Epidemiology (0.54)
Energy > Power Industry (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback